3 research outputs found

    Direct Inter-Process Communication (dIPC): Repurposing the CODOMs architecture to accelerate IPC

    Get PDF
    In current architectures, page tables are the fundamental mechanism that allows contemporary OSs to isolate user processes, binding each thread to a specific page table. A thread cannot therefore directly call another process's function or access its data; instead, the OS kernel provides data communication primitives and mediates process synchronization through inter-process communication (IPC) channels, which impede system performance. Alternatively, the recently proposed CODOMs architecture provides memory protection across software modules. Threads can cross module protection boundaries inside the same process using simple procedure calls, while preserving memory isolation. We present dIPC (for "direct IPC"), an OS extension that repurposes and extends the CODOMs architecture to allow threads to cross process boundaries. It maps processes into a shared address space, and eliminates the OS kernel from the critical path of inter-process communication. dIPC is 64.12× faster than local remote procedure calls (RPCs), and 8.87× faster than IPC in the L4 microkernel. We show that applying dIPC to a multi-tier OLTP web server improves performance by up to 5.12× (2.13× on average), and reaches over 94% of the ideal system efficiency.We thank Diego Marr´on for helping with MariaDB, the anonymous reviewers for their feedback and, especially, Andrew Baumann for helping us improve the paper. This research was partially funded by HiPEAC through a collaboration grant for Lluís Vilanova (agreement number 687698 for the EU’s Horizon2020 research and innovation programme), the Israel Science Fundation (ISF grant 769/12) and the Israeli Ministry of Science, Technology and Space.Peer ReviewedPostprint (author's final draft

    ecoHMEM: Improving object placement methodology for hybrid memory systems in HPC

    Get PDF
    Recent byte-addressable persistent memory (PMEM) technology offers capacities comparable to storage devices and access times much closer to DRAMs than other non-volatile memory technology. To palliate the large gap with DRAM performance, DRAM and PMEM are usually combined. Users have the choice to either manage the placement to different memory spaces by software or leverage the DRAM as a cache for the virtual address space of the PMEM. We present novel methodology for automatic object-level placement, including efficient runtime object matching and bandwidth-aware placement. Our experiments leveraging Intel® Optane™ Persistent Memory show from matching to greatly improved performance with respect to state-of-the-art software and hardware solutions, attaining over 2x runtime improvement in miniapplications and over 6% in OpenFOAM, a complex production application.This paper received funding from the Intel-BSC Exascale Laboratory SoW 5.1, the European Union’s Horizon 2020 research and innovation program under the Marie Sklodowska-Curie grant agreement No. 749516, the EPEEC project from the European Union’s Horizon 2020 research and innovation program under grant agreement No 801051, the DEEP-SEA project from the European Commission’s EuroHPC program under grant agreement 955606, and the Ministerio de Ciencia e Innovacion—Agencia Estatal de Investigación (PID2019-107255GB-C21/AEI/10.13039/501100011033).Peer ReviewedPostprint (author's final draft

    Direct Inter-Process Communication (dIPC): Repurposing the CODOMs architecture to accelerate IPC

    No full text
    In current architectures, page tables are the fundamental mechanism that allows contemporary OSs to isolate user processes, binding each thread to a specific page table. A thread cannot therefore directly call another process's function or access its data; instead, the OS kernel provides data communication primitives and mediates process synchronization through inter-process communication (IPC) channels, which impede system performance. Alternatively, the recently proposed CODOMs architecture provides memory protection across software modules. Threads can cross module protection boundaries inside the same process using simple procedure calls, while preserving memory isolation. We present dIPC (for "direct IPC"), an OS extension that repurposes and extends the CODOMs architecture to allow threads to cross process boundaries. It maps processes into a shared address space, and eliminates the OS kernel from the critical path of inter-process communication. dIPC is 64.12× faster than local remote procedure calls (RPCs), and 8.87× faster than IPC in the L4 microkernel. We show that applying dIPC to a multi-tier OLTP web server improves performance by up to 5.12× (2.13× on average), and reaches over 94% of the ideal system efficiency.We thank Diego Marr´on for helping with MariaDB, the anonymous reviewers for their feedback and, especially, Andrew Baumann for helping us improve the paper. This research was partially funded by HiPEAC through a collaboration grant for Lluís Vilanova (agreement number 687698 for the EU’s Horizon2020 research and innovation programme), the Israel Science Fundation (ISF grant 769/12) and the Israeli Ministry of Science, Technology and Space.Peer Reviewe
    corecore